Recognition of non-domain phrases in automatically extracted lists of terms
نویسندگان
چکیده
In the paper, we address the problem of recognition of non-domain phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms and discourse function expressions. We tested several methods based on domain corpora comparison and a method based on contexts of phrases identified in a large corpus of general language. We compared the results of the methods to manual annotation. The results show that the task is quite hard as the inter-annotator agreement is low. Several tested methods achieved similar overall results, although the phrase ordering varied between methods. The most successful method with the precision about 0.75 at the half of the tested list was the context based method using a modified contextual diversity coefficient.
منابع مشابه
Extracting Conceptual Terms from Medical Documents
Automated biomedical concept recognition is important for biomedical document retrieval and text mining research. In this paper, we describe a two-step concept extraction technique for documents in biomedical domain. Step one includes noun phrase extraction, which can automatically extract noun phrases from medical documents. Extracted noun phrases are used as concept term candidates which beco...
متن کامل3D Models Recognition in Fourier Domain Using Compression of the Spherical Mesh up to the Models Surface
Representing 3D models in diverse fields have automatically paved the way of storing, indexing, classifying, and retrieving 3D objects. Classification and retrieval of 3D models demand that the 3D models represent in a way to capture the local and global shape specifications of the object. This requires establishing a 3D descriptor or signature that summarizes the pivotal shape properties of th...
متن کاملExpanding Opinion Lexicon with Domain Specific Opinion Words Using Semi-Supervised Approach
Opinion words as well as opinion phrases and idioms are very useful in sentiment analysis. All these terms together build opinion or sentiment lexicons. Therefore, opinion lexicons are large lists of terms that encode the sentiment of each phrase within it. Generally, to create such a lexicon automatically, high-precision classifiers use known sentiment vocabulary, e.g. the prior polarity of an...
متن کاملInstance-Driven Discovery of Ontological Relation Labels
An approach is presented to the automatic discovery of labels of relations between pairs of ontological classes. Using a hyperlinked encyclopaedic resource, we gather evidence for likely predicative labels by searching for sentences that describe relations between terms. The terms are instances of the pair of ontological classes under consideration, drawn from a populated knowledge base. Verbs ...
متن کاملIdentifying important concepts from medical documents
Automated medical concept recognition is important for medical informatics such as medical document retrieval and text mining research. In this paper, we present a software tool called keyphrase identification program (KIP) for identifying topical concepts from medical documents. KIP combines two functions: noun phrase extraction and keyphrase identification. The former automatically extracts n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016